Storage system that executes performance optimization that maintains redundancy

ABSTRACT

One storage area is selected from two or more storage areas of a high load physical storage device, a physical storage device withal lower load than that of the physical storage device is selected, and it is judged whether the redundancy according to the RAID level corresponding to the logical volume decreases when the data elements stored in the selected storage area are transferred to the selected low load physical storage device. If the result of this judgment is that the redundancy does not decrease, the data elements stored in the selected storage area are transferred to a buffer area of the selected low load physical storage device and the logical address space of the logical volume that corresponds to the selected storage area is associated with the buffer area.

CROSS-REFERENCE TO PRIOR APPLICATION

This application relates to and claims the benefit of priority fromJapanese Patent Application number 2007-159303, filed on Jun. 15, 2007the entire disclosure of which is incorporated herein by reference.

BACKGROUND

The present invention generally relates to the optimization of theperformance of a storage system.

As a technology for distributing the load of a storage device, thetechnology disclosed in Japanese Application Laid Open Nos. H7-56691 and2006-53601, for example, are known. Japanese Application Laid Open No.H7-56691 discloses a load distribution technology for a plurality ofdisk devices that constitute a striping disk. Japanese Application LaidOpen No. 2006-53601 discloses a load distribution technology of alogical storage device (logical volume).

Normally, a storage system performs storage control, which utilizes RAID(Redundant Array of Independent (or Inexpensive) Disks) technology. Morespecifically, for example, a storage system comprises a RAID group thatis constituted by two or more physical storage devices (also known as a‘parity group’ or an ‘array group’) and storage control that is adaptedto the RAID level of the RAID group is carried out.

RAID levels that are generally adopted include RAID levels which, evenwhen a fault occurs in one of the two or more physical storage devicesconstituting the RAID group, allow data elements stored in a physicalstorage device in which a fault occurs to be recovered (‘recoverable’RAID level hereinbelow) and, more specifically, RAID levels other thanRAID0 (RAID1, RAID5, or RAID6, for example).

The physical storage device comprises a plurality of physical storageareas. The loads of two or more physical storage devices that constitutethe same RAID group are sometimes subject to scattering due to theaccess pattern. Hence, the distribution of the load of the physicalstorage device and, more specifically, the re-arrangement of the dataelements in physical storage area units is thought to be desirable.

Further, each time such load distribution is performed, redundancy mustbe maintained in cases where a recoverable RAID level is adopted.However, according to Japanese Application Laid Open No. H7-56691,because of the load distribution of a plurality of disk devicesconstituting a striping disk, in other words, the load distribution incases where the RAID level is RAID0, load distribution that considersredundancy is not carried out. In addition, according to the technologydisclosed in Japanese Application Laid Open No. 2006-53601, the load ofthe logical storage devices (logical volumes) rather than the load ofthe physical storage devices is distributed.

SUMMARY

Therefore, an object of the present invention is to distribute the loadof the physical storage devices while maintaining the redundancy of thestorage system.

Further objects of the present invention will become clear from thesubsequent description.

One storage area is selected from two or more storage areas of a highload physical storage device, a physical storage device with a lowerload than that of the physical storage device is selected, and it isjudged whether the redundancy according to the RAID level correspondingto the logical volume decreases when the data elements stored in theselected storage area are transferred to the selected low load physicalstorage device. If the result of this judgment is that the redundancydoes not decrease, the data elements stored in the selected storage areaare transferred to a buffer area of the selected low load physicalstorage device, and a logical address space of the logical volume thatcorresponds to the selected storage area is associated with the bufferarea.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 shows an overall constitutional example of a computer system ofan embodiment of the present invention;

FIG. 2 shows a constitutional example of a storage space provided by aphysical disk;

FIG. 3 shows computer programs and data that are stored in memory;

FIG. 4 is a constitutional example of an LU management table;

FIG. 5 shows a constitutional example of a zone management table;

FIG. 6 shows a constitutional example of a zone load management table;

FIG. 7 shows a constitutional example of a disk load management table;

FIG. 8 shows an example of the flow of I/O command processing;

FIG. 9 shows an example of the flow of system performance optimizationprocessing;

FIG. 10 shows an example of the flow of load distribution processingthat is executed in step 201 in FIG. 9;

FIG. 11 shows an example of the flow of swap feasibility judgmentprocessing that is executed in step 302 in FIG. 10;

FIG. 12 shows an example of the flow of swap target search processingthat is executed in step 402 in FIG. 11;

FIG. 13 shows an example of the flow of swap processing that is executedin step 303 of FIG. 10;

FIG. 14 shows an example of the flow of disk performance optimizationprocessing that is executed in step 203 in FIG. 9;

FIG. 15 is an explanatory diagram that provides an overview of the diskperformance optimization processing;

FIG. 16 shows a constitutional example of a storage space provided by aRAID group;

FIG. 17 shows the relationship between the respective areas and zonenumbers of a disk medium of a physical disk; and

FIG. 18 shows a modified example of the constitution of the storagespace provided by the physical disk.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

According to a first embodiment, in a storage system, access based on aRAID level corresponding to the logical volume is made by an accessmodule to respective storage areas of each of the two or more physicalstorage devices constituting an access destination logical volume andthe storage system comprises a load calculation module that calculatesthe load of each of the physical storage devices based on a load thataccompanies the access to each storage area; and a load distributionprocessing module that executes load distribution processing todistribute the loads of the plurality of physical storage devices. Theload distribution processing module comprises a judgment module and datare-arrangement module. In the load distribution processing, the judgmentmodule selects one storage area from two or more storage areas of a highload physical storage device, selects a physical storage device with alower load than that of the physical storage device, and judges whetherthe redundancy according to the RAID level (the RAID level correspondingto the logical volume in which the data elements are stored) decreaseswhen the data elements stored in the selected storage area aretransferred to the selected low load physical storage device. If theresult of this judgment is that the redundancy does not decrease, in theload distribution processing, the data re-arrangement module transfersthe data elements stored in the selected storage area to a buffer areaof the selected low load physical storage device (a storage area used asa buffer that does not correspond to a logical address space of thelogical volume) and associates the logical address space of the logicalvolume that corresponds to the selected storage area with the bufferarea.

According to a second embodiment, in the first embodiment, the storagesystem further comprises volume management information representingwhich storage area of which physical storage device each of theplurality of data elements written to volume element areas constitutingthe logical volume is written to. This information is stored in astorage area in the storage system, for example. The judgment of whetherthe redundancy decreases is a judgment that is performed by referencingthe volume management information and is a judgment of whether thephysical storage device having the selected storage area is the selectedlow load physical storage device, for the same volume element area.

According to a third embodiment, in at least one of the first and secondembodiments, after data elements have been transferred from the selectedstorage area by the data re-arrangement module, the storage area isestablished as the buffer area. More specifically, for example, anaddress space representing an invalid (value) is associated as thelogical address space associated with the storage area or a valuerepresenting the buffer area is associated with the storage area asinformation representing the attribute of the storage area.

According to a fourth embodiment, in at least one of the first to thirdembodiments, in cases where the judgment result is that the redundancydecreases, the judgment module selects a physical storage device whichhas a lower load than that of the high load physical storage device andhas a higher load than the physical storage device that is selected inthe previous judgment. A judgment of whether the redundancy decreasescan be performed once again for the selected physical storage device.

According to a fifth embodiment, in at least one of the first to fourthembodiments, in cases where the high load physical storage device is aphysical storage device with the Kth (K is an integer) highest loadamong the plurality of physical storage devices, the low load physicalstorage device that is initially selected is the physical storage devicewith the Kth lowest load among the plurality of physical storagedevices.

According to a sixth embodiment, in at least one of the first to fifthembodiments, the judgment module also judges, based on the load of theselected storage area, whether the load of the selected low loadphysical storage device exceeds a predetermined value in cases where thedata elements are transferred, and if the load exceeds the predeterminedvalue, selects a storage area with a lower load than that of thepreviously selected storage area, from the two or more storage areas ofthe high load physical storage device.

According to a seventh embodiment, in at least one of the first to sixthembodiments, the data re-arrangement module transfers data elements thatare stored in a storage area selected from two or more storage areas ofthe low load physical storage device to the buffer area of the high loadphysical storage device. That is, data elements stored in the firststorage area selected from the high load physical storage device arecopied to a buffer area in the low load physical storage device and dataelements stored in the second storage area selected from the high loadphysical storage device are copied to a buffer area of the high loadphysical storage device.

According to an eighth embodiment, in the seventh embodiment, thejudgment module also judges whether the load of the second storage areais higher than the load of the first storage area selected from the highload physical storage device, and if the former is higher than thelatter, selects a physical storage device which has a lower load thanthe high load physical storage device and which is other than the lowload physical storage device.

According to a ninth embodiment, in at least one of the first to eighthembodiments, the storage system further comprises a storage deviceoptimization module that executes storage device optimization processingfor a certain physical storage device among the plurality of physicalstorage devices. In the storage device optimization processing, for thecertain physical storage device, data elements stored in a first storagearea with high-speed access and a low load are copied to a buffer areaof the certain physical storage device, and data elements stored in asecond storage area with lower-speed access than the first storage areaand with a high load are copied to the first storage area, and thelogical address space associated with the second storage area isassociated with the first storage area while the logical address spaceassociated with the first storage area is associated with the bufferarea.

According to a tenth embodiment, in the ninth embodiment, the storagedevice optimization processing is executed after the load distributionprocessing.

According to an eleventh embodiment, in at least one of the ninth andtenth embodiments, in the storage device optimization processing, dataelements stored in the second storage area are also copied to the firststorage area, whereupon the second storage area is established as abuffer area.

According to a twelfth embodiment, in at least one of the ninth toeleventh embodiments, the certain physical storage device is adisk-medium drive. The first storage area is a storage area that existscloser to the outer periphery of the disk medium than the second storagearea.

According to a thirteenth embodiment, in at least one of the ninth totwelfth embodiments, the storage system further comprises: a dispersionextent judgment module that judges whether the dispersion extent of theload of the plurality of physical storage devices is equal to or lessthan a predetermined extent. When it is judged that the dispersionextent of the load is equal to or less than the predetermined extent,the storage device optimization processing is performed withoutperforming the load distribution processing.

According to a fourteenth embodiment, in at least one of the ninth tothirteenth embodiments, the certain physical storage device has two ormore buffer areas. The storage device optimization module is able toperform storage device optimization processing by using the two or morebuffer areas in parallel for two or more first storage devices of thecertain physical storage device.

According to a fifteenth embodiment, in at least one of the ninth tofourteenth embodiments, respective storage area identifiers of two ormore storage areas of the certain physical storage device are serialnumbers. In cases where the load rankings of the respective storageareas are set as storage area identifiers, the copy destination of thedata elements stored in the respective storage areas is a storage areathat is identified from the storage area identifiers.

Two or more embodiments of the plurality of above embodiments can becombined. Further, the respective modules (the judgment module, datare-arrangement module, storage device performance optimization module,for example) can be constructed by hardware, a computer program or acombination thereof (some of the parts are implemented by a computerprogram while the remainder are implemented by hardware, for example).The computer program is read to a predetermined processor and executed.Further, in the event of information processing, in which the computerprogram is read to the processor and executed, a storage area thatexists on hardware resources such as memory may be used. In addition,the computer program may be installed on the computer from a recordingmedium such as a CD-ROM or may be downloaded to the computer via acommunication network.

An embodiment of the present invention will be described hereinbelow indetail with reference to the drawings.

FIG. 1 shows an overall constitutional example of the computer systemaccording to an embodiment of the present invention.

In FIG. 1, a storage system 10 is constituted by a control section 100that performs control of the whole storage system 10 and a disk section200 in which data are stored.

A host computer (a higher level device (application server, for example)that utilizes the storage system 10) 300 is connected to the storagesystem 10 via a host adapter 110 of the control section 100. Theinterface for connecting the host computer 300 to the storage system 10uses a SAN (Storage Area Network) 180, for example. In the architectureof the SAN 180, for example, a fiber channel, SCSI (Small ComputerSystem Interface), iSCSI (internet Small Computer System Interface), USB(Universal Serial Bus), IEEE1394 bus or the like can be used. Inaddition, there may also be a plurality of the host computer 300connected to the storage system 10. Furthermore, another type ofinterface may also be adopted instead of the SAN 180.

The host computer 300 has control software for controlling the operationof the storage system 10 installed thereon and, using control softwarethat is executed by the host computer 300, commands and so forth can beissued to the storage system 10 to control the operation of the storagesystem 10. However, the control software that is executed by the storagesystem 10 and host computer 300 is distributed via a LAN (Local AreaNetwork) 190, for example. With regard to the computer for performingmanagement, control or maintenance of the storage system 10, a computerother than the host computer 300 may also be used. In addition, anothertype of communication network may also be utilized instead of the LAN190.

The control section 100 comprises a host adapter 110 to which the hostcomputer 300 is connected and which communicates with the host computer300, a CPU (Central Processing Unit) 120 that performs overall controlof the storage system 10, and a memory 130 on which a computer programand data and so forth that are required for the CPU 120 to control thestorage system 10 are stored. In addition, the control section 100comprises a cache memory 140 in which data that are communicated betweenthe host computer 300 and disk section 200 are temporarily stored, anASICs (application specific integrated circuits) 150 which computeparity data, and a disk adapter 160 that is connected to the respectivephysical disks 2100 constituting the disk section 200 and whichcommunicates with the respective physical disks 2100.

The disk section 200 comprises a plurality of disk boxes 210 andcomprises a plurality of physical disks 2100 in the respective diskboxes 210. A RAID group is constituted by two or more physical disks2100 among the plurality of physical disks 2100. The physical disk 2100is a hard disk drive (HDD), for example, but may also be another type ofphysical disk drive such as a DVD (Digital Versatile Disk) drive, forexample. In addition, another type of physical storage device such as asemiconductor memory drive (a flash memory drive, for example) may beadopted in place of the physical disk drive.

FIG. 2 shows a constitutional example of the storage space provided bythe physical disk 2100. FIG. 16 shows a constitutional example of thestorage space provided by the RAID group.

As shown in FIG. 16, based on the storage space provided by the RAIDgroup 2500, one or more logical storage devices (called LU (LogicalUnits) hereinbelow) is formed. One LU is constituted by a part of astorage space of the respective physical disks 2100 constituting theRAID group. The LU is constituted by a plurality of storage areas of apredetermined size. The storage area of a predetermined size isexpediently known as a ‘stripe’.

As shown in FIG. 2, the storage area provided by the physical disk 2100is constituted by n storage areas of a predetermined size. The storageareas of a predetermined size are called ‘zones’ (shown as ‘Zone’ inEnglish in FIG. 2). A ‘zone’ as it is intended here is a constituentelement of a stripe. A stripe has a plurality of data elements and aredundant data element computed based on the plurality of data elementswritten therein and one zone has one data element or one redundant dataelement written therein. The size of a zone is uniform for all of thephysical disks 2100 and the size can be changed by control software thatis executed by the host computer 300.

A data area 2200 is constituted by n—1 zones among N zones and a swaparea 2300 is constituted by one zone. The data area 2200 is used as awrite destination for user data elements and redundant data elements andthe swap area 2300 is used as a temporary buffer area when the dataelements are re-arranged.

FIG. 3 shows a computer program and data stored in the memory 130.

The memory 130 stores a program group 1300 that is executed by the CPU120, an LU management table 1310 that records information relating tothe association between the logical address space of an LU 2400 and thephysical address space of the physical disks 2100, a zone managementtable 1320 that records zone-related information, a zone load managementtable 1330 that records information relating to the load states of eachzone, and a disk load management table 1340 that records informationrelating to the load states of the respective physical disks 2100.

Here, the LU management table 1310 is created for each LU 2400 and thezone management table 1320 and zone load management table 1330 arecreated for each physical disk 2100.

In addition, a program group 1300 includes, for example, an I/O controlprogram 1301 that processes I/O commands received from the host computer300 and updates the various load management tables 1330 and 1340, a loaddistribution program 1302 that distributes the load between the physicaldisks, and a disk performance optimization program 1303 that optimizesthe performance of the physical disks 2100. The load distributionprogram 1302 has a swap feasibility judgment program 13021 that judgesthe feasibility of a data element swap, a swap target search program13022 that searches for swap targets, a swap program 13023 that performsdata element swaps, and a load distribution control program 13024 thatcontrols the execution of load distribution processing. In cases wherethe computer program is the subject hereinbelow, the processing isexecuted by the CPU 120 that actually executes the computer program.

FIG. 4 shows a constitutional example of the LU management table 1310.

The LU management table 1310 has a column 1311 in which a stripe numberis written, a column 1312 in which the RAID level is written, a column1313 in which information relating to the logical address space iswritten, and a column 1314 in which information relating to the physicaladdress space is written. Column 1313 has a column 13131 in which thestart LBA (Logical Block Address) of the logical address space iswritten and a column 13132 in which the end LBA is written. Column 1314has a column 1315 in which information representing the physicalposition of the storage area in which a user data element exists iswritten, a column 1316 in which information representing the position ofthe physical storage area in which a redundant data element 1 exists iswritten, and a column 1317 in which information representing theposition of the physical storage area in which a redundant data element2 exists is written. The columns 1315, 1316, and 1317 each have columns13151, 13161, and 13171 respectively in which the HDD number is written,and columns 13152, 13162, and 13172 in which zone numbers are written.The LU management table 1310 records one record for each stripe of eachsingle LU. One record is constituted by the number of the stripe, theRAID level of the RAID group having the LU, information relating to thelogical address spaces corresponding to each stripe (leading LBA and endLBA), the HDD number and zone number in which a user data elementexists, an HDD number and zone number in which redundant data 1 exists,and the HDD number and zone number in which redundant data 2 exists. Itcan be seen from this record which logical address space and physicaladdress space are associated with which stripe and by which zone ofwhich physical disk 2100 the corresponding physical address space isconstituted.

Further, ‘redundant data element 1’ is a redundant data element that iscreated in cases where the redundancy is 1, for example. Morespecifically, for example, redundant data element 1 is a copy of theoriginal user data element created in cases where the RAID level isRAID1 (mirrored user data element), a parity data element that iscreated in cases where the RAID level is RAID5 or one of the tworedundant data elements created in cases where the RAID level is RAID6.‘Redundant data element 2’ is a redundant data element that is createdin cases where the redundancy is 2, for example. More specifically, forexample, the redundant data element 2 is the other of the two redundantdata elements created in cases where the RAID level is RAID6. Hence,depending on the RAID level of the RAID group in which the LU 2400corresponding to the table 1310 exists, an invalid value (“N/A”, forexample) is recorded as the HDD number and zone number corresponding toredundant data elements 1 and 2.

In addition, the HDD number recorded in table 1310 is expressed by acombination of two kinds of numbers, where the first of these two kindsof numbers is the number of the disk box 210 and the other of these twokinds of numbers is the number of the physical disk (HDD) 2100 in thedisk box 210.

FIG. 5 shows a constitutional example of the zone management table 1320.

The zone management table 1320 has a column 1321 in which the zonenumber is written, a column 1322 in which the zone attribute is written,a column 1323 in which the LU number is written, a column 1324 in whichinformation relating to the logical address space is written, a column1325 in which information relating to the physical address space iswritten, and a column 1326 in which optimal position information iswritten. Column 1324 has a column 13241 in which the start LBA of thelogical address space is written and a column 13242 in which the end LBAis written. Column 1325 has a column 13251 in which the start LBA of thephysical address space is written and a column 13252 in which the endLBA is written. One record is recorded for each single zone. One recordis constituted by the zone number, zone attribute, LU number, start LBAand end LBA of the logical address space, start LBA and end LBA of thephysical address space, and optimal position information. It can be seenfrom one record which zones correspond to which logical address spacesand which physical address spaces.

In this embodiment, as exemplified in FIG. 17, the zones correspondingto the storage areas that exist on the outer periphery of the disk mediaof the physical disks 2100 have a small zone number allocated theretoand the zones that correspond to the storage areas that exist on theinner periphery of the disk media have large zone numbers allocatedthereto.

In addition, the zone attribute represents the state of the zone and canbe expressed by three parameters such as ‘Data’, ‘swap’ and ‘N/A’, forexample. Here, ‘Data’ (shown as “D” in FIG. 5) represents a state wherea user data element or redundant data element is stored in the zone;‘swap’ (shown as “Swap” in FIG. 5) represents the fact that a zone is ina state of being used as a temporary buffer zone during re-arrangementof data elements, and ‘N/A’ represents the fact that the zone is an areafor data element storage but is an unused space.

The optimal position information is information indicating which zone adata element stored in the zone is to be stored in and is, morespecifically, the zone number of the zone constituting the transferdestination for the data element, for example.

FIG. 6 shows a constitutional example of the zone load management table1330.

The zone load management table 1330 has a column 1331 in which the zonenumber is written, a column 1332 in which the number of commands iswritten, and a column 1333 in which the load ranking is written. Thezone number, number of commands, and load ranking are written for eachsingle zone.

Here, the number of commands is the access frequency with respect to thecorresponding zone and is incremented each time the zone is accessed.

In addition, the load ranking indicates the order of the size of theload for each zone so that zones with higher zones have a lower numberallocated thereto. More specifically, the zone with the highest load hasthe number “0” allocated thereto.

FIG. 7 shows a constitutional example of the disk load management table1340.

The disk load management table 1340 has a column 1341 in which the HDDnumber is written and a column 1342 in which the load ratio is written.The HDD number and load ratio are recorded for each single physical disk2100. The load ratio is calculated based on the IOPS (access frequencyper second) of the physical disk 2100 and the number of command for eachzone which is recorded in the zone load management table 1330corresponding to the physical disk 2100, for example.

The content of the LU management table 1310, zone management table 1320,zone load management table 1330, and disk load management table 1340 canbe confirmed by the user of the host computer 300 by utilizing controlsoftware that is executed by the host computer 300. In other words, theuser is able to determine which logical address space of which LU isassociated with which zone and obtain the load status and so forth inphysical disk units and zone units or the like.

The processing that is executed in this embodiment will be describednext.

FIG. 8 shows an example of the flow of I/O command processing. The stepsin FIG. 8 are abbreviated as ‘S’.

The I/O control program 1301 executes I/O processing in response to theI/O command (read command or write command) that is received from thehost computer 300 (step 100). More specifically, for example, the I/Ocontrol program 1301 specifies, based on the LU number and LBAdesignated by the I/O command, the physical address space thatcorresponds to the logical address space specified by the LU number andLBA and accesses the zone corresponding to the specified physicaladdress space.

Thereafter, in accordance with the I/O processing in step 100, the I/Ocontrol program 1301 updates the number of commands and load ranking ofthe zone load management table 1330 that corresponds to the accessedzone (step 101).

In addition, the I/O control program 1301 calculates the load ratio ofthe physical disk 2100 holding the accessed zone based on the number ofupdated commands of the zone load management table 1330 and rewrites thecalculated load ratio with the existing load ratio that corresponds tothe physical disk 2100 (the load ratio recorded in the disk loadmanagement table 1340) (step 102).

As a result of the above I/O command processing, the zone loadmanagement table 1330 and disk load management table 1340 are updatedeach time the I/O processing is performed in response to the I/Ocommand.

FIG. 9 shows an example of the flow of the system performanceoptimization processing.

The load distribution control program 13024 judges whether the loaddistribution processing is required (step 200). The load distributioncontrol program 13024 references the disk load physical table 1340, forexample, and, if there is a set of physical disks 2100 for which thedifference in the load ratio is equal to or more than a predeterminedvalue, judges that the load distribution processing is required. Ifthere is no such set of physical disks 2100, the load distributioncontrol program 13024 judges that the load distribution processing isnot required.

In cases where it is judged that the load distribution processing isrequired (step 200: Yes), the load distribution control program 13024executes load distribution processing (step 201) and, thereafter, causesthe disk performance optimization program 1303 to execute step 202.However, in cases where the load distribution processing is not required(step 200: No), the load distribution control program 13024 skips step201 and causes the disk performance optimization program 1303 to executestep 202.

The disk performance optimization program 1303 judges whether the diskperformance optimization is required (step 202). More specifically, forexample, for each physical disk 2100, the disk performance optimizationprogram 1303 references the respective load ranking recorded in the zoneload management table 1330 and judges whether the zones on the innerperiphery of the physical disk 2100 have a higher load than the zones onthe outer periphery thereof. In cases where it is judged that the zoneson the inner periphery of the physical disk 2100 have a higher load thanthe zones on the outer periphery, disk performance optimization isrequired and, if not, disk performance optimization is not required.

In cases where it is ascertained that disk performance optimization isrequired (step 202: Yes), the disk performance optimization program 1303executes disk performance optimization (step 203) and ends the systemperformance optimization processing. On the other hand, in cases whereit is judged that disk performance optimization is not required, thedisk performance optimization program 1303 ends the system performanceoptimization without further processing.

Here, system performance optimization processing is started by startingup the load distribution control program 13024 of the load distributionprogram 1302 at regular intervals, for example. As an example of regulartiming, optional conditions can be adopted such as the execution atfixed times using a timer and the execution each time the number ofcommands received from the host computer 300 reaches a multiple of aspecified number. In addition, examples of irregular timing, forexample, include cases where the administrator of the storage system 10uses control software that is installed on the host computer 300 toinstruct the execution of this processing, cases where the storagesystem 10 receives a specified command, and cases where the CPU 120executes a specified command.

FIG. 10 shows an example of the flow of load distribution processingthat is performed in step 201 of FIG. 9.

The load distribution control program 13024 references the disk loadmanagement table 1340 and judges whether all of the physical disks 2100have reached a processing limit (step 300). In cases where at least onephysical disk 2100 has not reached the processing limit (step 300: No),the load distribution control program 13024 selects one of the physicaldisks 2100 that have reached the processing limit as the processingtarget disk (step 301). However, in cases where all of the physicaldisks 2100 have reached the processing limit (step 300: Yes), the loaddistribution control program 13024 ends the load distributionprocessing. This is because it is difficult to avoid the processinglimit state even when data elements are re-arranged between physicaldisks.

After step 301, the load distribution control program 13024 calls theswap feasibility judgment program 13021. The swap feasibility judgmentprogram 13021 ascertains whether load distribution of the processingtarget disk is possible by means of the re-arrangement of data elementsbetween the physical disks 2100 (whether a swap is possible) (step 302).

The load distribution control program 13024 moves to step 304 in caseswhere the result of ascertaining the swap feasibility is returned (step302: No). However, in cases where the result of ascertaining the swapfeasibility is returned (step 302: Yes), the load distribution controlprogram 13024 calls the swap program 13023. The swap program 13023executes a switch (that is, a swap) of data elements between the swapdestination disk and processing target disk (step 303). Thereafter, theload distribution control program 13024 ascertains, based on the diskload management table 1340, whether there is a physical disk for whichthe processing limit has been reached in addition to the processingtarget disk (step 304). In cases where a physical disk 2100 for whichthe processing limit has been reached also exists, the load distributioncontrol program 13024 returns to step 301 and, in cases where thephysical disk 2100 for which the processing limit has been reached doesnot exist, the load distribution control program 13024 terminates theload distribution processing.

FIG. 11 shows an example of the flow of swap feasibility judgmentprocessing that is performed in step 302 in FIG. 10.

The swap feasibility judgment program 13021 selects the processingtarget disk that is selected in step 301 as the swap source disk (step400) and, based on the zone load management table 1330 corresponding tothe swap source disk, selects the zone with the maximum load ratio inthe swap source disk as the swap source zone (step 401).

Thereafter, the swap feasibility judgment program 13021 calls the swaptarget search program 13022. The swap target search program 13022searches the physical disk and zone constituting the swap target (step402). In cases where it is judged in step 402 that there is a swaptarget, the swap feasibility judgment program 13021 selects the returnvalues from the swap target search program 13022 of step 402 as the swapdestination disk and swap destination zone respectively (step 403) andthen returns a ‘Yes’, which indicates that a swap is possible, to theload distribution control program 13024 and ends step 302. However, incases where it is judged in step 402 that there is no swap target, theswap feasibility judgment program 13021 returns a ‘No’, which indicatesthat a swap is impossible, to the load distribution control program13024 and ends step 302.

FIG. 12 shows an example of the flow of swap target search processingthat is performed in step 402 of FIG. 11.

The swap target search program 13022 references the disk load managementtable 1340 and selects the physical disk with the lowest load ratio asthe swap destination disk (step 500). The swap target search program13022 checks whether the redundancy of the data stored in the swapsource zone (zone selected in step 401) and swap destination zone as aresult of executing the swap does not decrease (step 501). Morespecifically, for example, the swap target search program 13022references the LU management table 1310 and ascertains whether dataelements (user data element and redundant data element) that exist inthe same stripe as the stripe in which the data elements stored in theswap source zone exist are not stored in the swap destination disk andwhether data elements (user data element and redundant data element)that exist in the same stripe as the stripe in which the data elementsstored in the swap destination zone exist are not stored in the swapsource disk. More specifically, in cases where the LU management table1310 has been updated by the swap, for example, it is ascertainedwhether the same HDD number does not then exist in one record of the LUmanagement table 1310. It is said that redundancy decreases if the sameHDD number exists.

Here, cases where the redundancy decreases include, for example, a casewhere two or more user data elements or redundant data elements (paritydata elements) of the same stripe of the LU constituted by RAID5 existon the same physical disk and cases where master data elements (originaluser data elements) and mirrored data elements (redundant data elements)of the same stripe of the LU constituted by RAID1 exist in the samephysical disk and so forth. This is because, when a fault arises withthe same physical disk, other data elements that exist in the samestripe can no longer be recovered.

In cases where it is ascertained that the redundancy decreases (step501: Yes), the swap target search program 13022 ascertains whether thephysical disk 2100 with the next highest load ratio of the physical disk2100 currently selected as the swap destination disk exists (step 502).When such a physical disk 2100 exists (step 502: Yes), the swap targetsearch program 13022 ascertains whether the physical disk has reachedthe processing limit based on the disk load management table 1340 (step503). This is because load distribution cannot be performed even when aswap is executed (that is, a data element switch) in cases where thenewly selected swap destination disk has reached the processing limit.

In cases where the currently selected physical disk has not reached theprocessing limit (step 503: No), the swap target search program 13022selects the selected physical disk as the new swap destination disk(step 504) and returns to step 501.

However, in cases where there is no applicable physical disk in step 502(step 502: No) and cases where the selected physical disk is in theprocessing limit state in step 503 (step 503: Yes), the swap targetsearch program 13022 returns a ‘No’ which indicates that there is noswap target to the swap feasibility judgment program 13021 and ends step402 (step 505).

In cases where it is judged in step 501 that the redundancy does notdecrease (step 501: No), the swap target search program 13022 referencesthe zone load management table 1330 corresponding to the selected swapdestination disk and selects the zone with the lowest load ratio as theswap destination zone (step 506).

Thereafter, the swap target search program 13022 judges whether the swapdestination disk has reached the processing limit as a result of theswap (step 507). The judgment of whether the swap destination disk hasreached the processing limit is ascertained based on whether the numberof commands after the swap (the number of commands of the swapdestination zone is changed to the number of commands of the swap sourcezone and the changed number of commands and the number of commands ofthe other zones of the swap destination disk are totaled) exceeds thenumber of commands that can be processed of the swap destination disk.

In cases where it is ascertained that the swap destination disk hasreached the processing limit in step 507 (step 507: Yes), the swaptarget search program 13022 selects the zone with the next highest loadratio of the selected swap source zone as the new swap source zone (step508).

Thereafter, the swap target search program 13022 ascertains whether theload ratio of the swap source zone is higher than the load ratio of theswap destination zone (step 509).

In step 509, in cases where the load ratio of the swap source zone ishigher than the load ratio of the swap destination zone (step 509: Yes),the swap target search program 13022 returns to step 507 and, in caseswhere the load ratio of the swap source zone is lower than the loadratio of the swap destination zone (step 509: No), the swap targetsearch program 13022 returns to step 502 (that is, another search to dodetermine whether there is no physical disk that is appropriate as aswap destination disk is executed). This is because the load of the swapsource disk is even higher as a result of the swap execution and swapexecution is not preferable.

In cases where it is ascertained in step 507 that the swap destinationdisk has not reached the processing limit as a result of the swap (step507: No), the swap target search program 13022 returns, in addition to a“Yes”, which indicates that there is a swap target, the currentlyselected swap destination disk (the HDD number thereof, for example)together with the swap destination zone (the zone number thereof, forexample) to the swap feasibility judgment program 13021 and ends step402 (step 510).

FIG. 13 shows an example of the flow of swap processing that isperformed in step 303 of FIG. 10.

The swap program 13023 starts processing to copy data elements stored inthe swap source zone to the swap zone on the swap destination disk (thezone constituting the swap area 2300) (step 600).

Thereafter, the swap program 13023 copies data elements stored in theswap destination zone to the swap zone on the swap source disk (step601).

Subsequently, the swap program 13023 updates the LU management table1310 and zone management table 1320. More specifically, for example, itcan be said for the LU management table 1310 that the swap program 13023changes the HDD number and zone number corresponding to the swap sourcezone into the HDD number and zone number corresponding to the swap zone(that is, copy destination zone) in the swap destination disk and,likewise, changes the HDD number and zone number corresponding to theswap destination zone to the HDD number and zone number corresponding tothe swap zone (that is, the copy destination zone) on the swap sourcedisk. Further, for example, it can be said for the zone management table1320 that the swap program 13023 changes the zone attributescorresponding to the swap source zone and swap destination zone from‘Data’ to ‘Swap’ and changes the respective corresponding LU number andlogical address space to ‘N/A’. In addition, the swap program 13023changes the zone attribute corresponding to the swap zone on the swapdestination disk from ‘Swap’ to ‘Data’ and changes the corresponding LUnumber and logical address space from ‘N/A’ to the LU number and logicaladdress space corresponding to the swap source zone. In addition, theswap program 13023 changes the zone attribute corresponding to the swapzone on the swap source disk from ‘Swap’ to ‘Data’ and changes thecorresponding LU number and logical address space from ‘N/A’ to the LUnumber and logical address space corresponding to the swap destinationzone.

The swap program 13023 reset the number of commands and load rankingrecorded in the zone load management table 1330 and the load ratiorecorded in the disk load management table 1340 (all are returned tozero, for example) (step 605).

FIG. 14 shows an example of the flow of disk performance optimizationprocessing performed in step 203 of FIG. 9. A description thereof willbe suitably provided with reference to FIG. 15. Further, steps 700 to705 in FIG. 15 correspond to steps 700 to 705 of FIG. 14.

The disk performance optimization program 1303 determines the optimalpositions for data elements on a physical disk for which zone switchingis required (a physical disk for which the zones at the inner peripherythereof have a higher load than zones at the outer periphery thereof,referred to hereinbelow as the ‘optimization target disk’) (step 700).More specifically, the disk performance optimization program 1303references the zone load management table 1330 corresponding to theoptimization target disk and copies the load ranking recorded in column1333 of the table 1330 to column 1326 of the zone management table 1320corresponding to the optimization target disk. The zone management table1320 and zone load management table 1330 both have a plurality ofrecords constituting the table (one data item corresponding to one zone)arranged in number order based on the zone number (in ascending order,for example). Further, the zone number and load ranking are bothintegers for which 0 is the minimum value. Hence, in the zone managementtable 1320, the copied load ranking is optimal position information andrepresents the number of the transfer destination zone.

Steps 701 to 706 hereinbelow are executed for each zone of theoptimization target disk.

The disk performance optimization program 1303 judges whether the zonenumber and optimal position information match (step 701).

In cases where a match is detected in step 701 (step 701: Yes), becausethe zone corresponding to the zone number is the optimum storageposition, the disk performance optimization program 1303 does notperform steps 702 to 705 and executes step 706.

However, in cases where a mismatch is detected in step 701 (step 701:No), the disk performance optimization program 1303 copies data element“0” that is stored in the current processing target zone (in a zone forwhich the zone number is “x” (where x is an integer) that is called‘zone (A)’ hereinbelow) to the swap zone in the optimization target disk(step 702).

Thereafter, the disk performance optimization program 1303 searches thezone storing the data elements that are to be stored in zone (A) (called‘zone (B)’ hereinbelow), in other words, the zone for which the samevalue as the zone number of zone (A) is the optimal position information(step 703). It may be said for the example in FIG. 15 that, in caseswhere the zone number of zone (A) is “0”, the zone number of zone (B) is“1”. That is, zone 0 (the zone with zone number “0”) is zone (A) andzone 1 is zone (B) while zone 3 is the swap zone.

After the copying of step 702 has ended, the disk performanceoptimization program 1303 copies data element “1” stored in zone (B) tozone (A) (step 704). As a result, data element “0” stored in zone (A) isoverwritten with data element “1”.

After the copying of step 704 has ended, the disk performanceoptimization program 1303 updates the LU management table 1310 and zonemanagement table 1320 (step 705). More specifically, for example, thedisk performance optimization program 1303 associates the LU number andlogical address space corresponding to zone 0 with zone 3, associatesthe LU number and logical address space corresponding to zone 1 withzone 0, and associates the LU number and logical address spacecorresponding to zone 3 (here, zone 3 is the swap zone and, therefore,the LU number and logical address space are each ‘N/A’) with zone 1.

Thereafter, the disk performance optimization program 1303 ascertainswhether the current processing target zone is the last zone (the zonewith the zone number “N”, for example) (step 706). When the currentprocessing target zone is not the last zone, the disk performanceoptimization program 1303 makes the next zone (the zone with the zonenumber “x+1”, for example) the processing target zone and executes theprocessing from steps 701 to 705. However, in cases where the currentprocessing target zone is the last zone, the disk performanceoptimization program 1303 resets the load ratio, number of commands, andload position corresponding to the optimization target disk (step 707)and ends step 203. Subsequently, if an optimization target disk stillremains, step 203 is executed for the other optimization target disk.

According to the embodiment hereinabove, by performing a swap to makeuniform the load between the physical disks 2100 and a swap to maximizethe performance of a single physical disk while maintaining the RAIDredundancy, a maximized performance can be expected for the storagesystem 10.

Although the preferred embodiments of the present invention weredescribed hereinabove, these embodiments are examples serving todescribe the present invention, there being no intention to limit thescope of the present invention to these embodiments alone. The presentinvention can also be implemented in a variety of other forms.

For example, as exemplified in FIG. 18, the number of zones constitutingswap area 2300 in one physical disk 2100 may also be two or more. Inthis case, two or more swap zones may be used in parallel in the diskperformance optimization processing. In other words, instead of step 203being performed for one processing target zone, two or more processingtarget zones may be selected and step 203 may be performed for two ormore processing target zones. As a result, the time interval of the diskoptimization processing of one physical disk 2100 can be shortened.

In addition, information such as the number of commands and the loadratio, for example, may be updated not only in the event of processingin accordance with I/O commands from the host computer 300 but also incases where access to the physical disk 2100 occurs for another reason.For example, an update may also be made in cases where access for thereplication or transfer of data between LU takes place.

1. A storage system, comprising: a plurality of physical storagedevices; a plurality of logical volumes formed based on a storage areagroup that constitutes the storage space of the plurality of physicalstorage devices; an access module that performs access, based on a RAIDlevel corresponding to the logical volume, to each of the storage areasof each of two or more physical storage devices that constitute anaccess destination logical volume; a load calculation module thatcalculates the load of each of the physical storage devices based on aload that accompanies the access to each storage area; and a loaddistribution processing module which executes load distributionprocessing to distribute the loads of the plurality of physical storagedevices and which comprises a judgment module and data re-arrangementmodule, wherein, in the load distribution processing, the judgmentmodule selects one storage area from two or more storage areas of a highload physical storage device, selects a physical storage device with alower load than that of the physical storage device, and judges whetherthe redundancy according to the RAID level decreases when data elementsstored in the selected storage area are transferred to the selected lowload physical storage device; and if the result of this judgment is thatthe redundancy does not decrease, the data re-arrangement moduletransfers the data elements stored in the selected storage area to abuffer area of the selected low load physical storage device, which is astorage area used as a buffer that does not correspond to a logicaladdress space of the logical volume, and associates the logical addressspace of the logical volume that corresponds to the selected storagearea with the buffer area.
 2. The storage system according to claim 1,further comprising volume management information representing whichstorage area of which physical storage device each of the plurality ofdata elements written to volume element areas constituting the logicalvolume is written to, wherein the judgment of whether the redundancydecreases is a judgment that is performed by referencing the volumemanagement information and is a judgment of whether the physical storagedevice having the selected storage area is the selected low loadphysical storage device, for the same volume element area.
 3. Thestorage system according to claim 1, wherein after data elements havebeen transferred from the selected storage area by the datare-arrangement module, the storage area is established as the bufferarea.
 4. The storage system according to claim 1, wherein, in caseswhere the judgment result is that the redundancy decreases, the judgmentmodule selects a physical storage device which has a lower load thanthat of the high load physical storage device and has a higher load thanthe physical storage device that is selected in the previous judgment.5. The storage system according to claim 1, wherein, in cases where thehigh load physical storage device is a physical storage device with theKth (K is an integer) highest load among the plurality of physicalstorage devices, the low load physical storage device that is initiallyselected is the physical storage device with the Kth lowest load amongthe plurality of physical storage devices.
 6. The storage systemaccording to claim 1, wherein the judgment module also judges, based onthe load of the selected storage area, whether the load of the selectedlow load physical storage device exceeds a predetermined value in caseswhere the data elements are transferred, and if the load exceeds thepredetermined value, selects a storage area with a lower load than thatof the previously selected storage area, from the two or more storageareas of the high load physical storage device.
 7. The storage systemaccording to claim 1, wherein the data re-arrangement module transfersdata elements that are stored in a storage area selected from two ormore storage areas of the low load physical storage device to the bufferarea of the high load physical storage device.
 8. The storage systemaccording to claim 1, wherein the judgment module also judges whetherthe load of a second storage area selected from the low load physicalstorage device is higher than the load of a first storage area selectedfrom the high load physical storage device, and if the former is higherthan the latter, selects a physical storage device which has a lowerload than the high load physical storage device and which is other thanthe low load physical storage device.
 9. The storage system according toclaim 1, further comprising: a storage device optimization module thatexecutes storage device optimization processing for a certain physicalstorage device among the plurality of physical storage devices, wherein,in the storage device optimization processing, for the certain physicalstorage device, data elements stored in a first storage area withhigh-speed access and a low load are copied to a buffer area of thecertain physical storage device, and data elements stored in a secondstorage area with lower-speed access than the first storage area andwith a high load are copied to the first storage area, and the logicaladdress space associated with the second storage area is associated withthe first storage area.
 10. The storage system according to claim 9,wherein the storage device optimization processing is executed after theload distribution processing.
 11. The storage system according to claim9, wherein, in the storage device optimization processing, data elementsstored in the second storage area are also copied to the first storagearea, whereupon the second storage area is established as a buffer area.12. The storage system according to claim 9, wherein the certainphysical storage device is a disk-medium drive; and the first storagearea is a storage area that exists closer to the outer periphery of thedisk medium than the second storage area.
 13. The storage systemaccording to claim 9, further comprising: a dispersion extent judgmentmodule that judges whether the dispersion extent of the load of theplurality of physical storage devices is equal to or less than apredetermined extent, wherein, when it is judged that the dispersionextent of the load is equal to or less than the predetermined extent,the storage device optimization processing is performed withoutperforming the load distribution processing.
 14. The storage systemaccording to claim 9, wherein the certain physical storage device hastwo or more buffer areas, and the storage device optimization moduleperforms storage device optimization processing by using the two or morebuffer areas in parallel for two or more first storage devices of thecertain physical storage device.
 15. The storage system according toclaim 9, wherein respective storage area identifiers of two or morestorage areas of the certain physical storage device are serial numbers;and in cases where the load rankings of the respective storage areas areset as storage area identifiers, the copy destination of the dataelements stored in the respective storage areas is a storage area thatis identified from the storage area identifiers.
 16. A method foroptimizing the performance of a storage system having a plurality ofphysical storage devices, comprising the steps of: selecting a firststorage area from two or more storage areas of a high load physicalstorage device; selecting a physical storage device with a lower loadthan that of the physical storage device; judging whether the redundancyaccording to the RAID level corresponding to a logical volume decreaseswhen data elements stored in the selected storage area are transferredto the selected low load physical storage device; if the result of thisjudgment is that the redundancy does not decrease, transferring the dataelements stored in the selected storage area to a buffer area of theselected low load physical storage device, which is a storage area usedas a buffer which does not correspond to a logical address space of thelogical volume; and associating the logical address space of the logicalvolume corresponding to the selected storage area with the buffer area.17. A computer program for optimizing the performance of a storagesystem having a plurality of physical storage devices, comprising:selecting a first storage area from two or more storage areas of a highload physical storage device; selecting a physical storage device with alower load than that of the physical storage device; judging whether theredundancy according to the RAID level corresponding to a logical volumedecreases when data elements stored in the selected storage area aretransferred to the selected low load physical storage device; if theresult of this judgment is that the redundancy does not decrease,transferring the data elements stored in the selected storage area to abuffer area of the selected low load physical storage device, which is astorage area used as a buffer which does not correspond to a logicaladdress space of the logical volume; and associating the logical addressspace of the logical volume corresponding to the selected storage areawith the buffer area.